stable point
4 ways to fix 'tech neck,' according to a physical therapist
Strengthening can help if you're staring at your phone too much. You don't need a ton of equipment to fix your neck. Breakthroughs, discoveries, and DIY tips sent every weekday. If you're here seeking relief from tech neck, or the forward head posture associated with the use of personal devices, we've got good and bad news. The good news is you've come to the right place; the bad news is you're probably contributing to it right now.
- Asia > Middle East > Jordan (0.07)
- North America > United States > New York > Albany County > Albany (0.05)
Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective
Wang, Siwei, Shen, Yifei, Sun, Haoran, Feng, Shi, Teng, Shang-Hua, Dong, Li, Hao, Yaru, Chen, Wei
Recent reinforcement learning (RL) methods have substantially enhanced the planning capabilities of Large Language Models (LLMs), yet the theoretical basis for their effectiveness remains elusive. In this work, we investigate RL's benefits and limitations through a tractable graph-based abstraction, focusing on policy gradient (PG) and Q-learning methods. Our theoretical analyses reveal that supervised fine-tuning (SFT) may introduce co-occurrence-based spurious solutions, whereas RL achieves correct planning primarily through exploration, underscoring exploration's role in enabling better generalization. However, we also show that PG suffers from diversity collapse, where output diversity decreases during training and persists even after perfect accuracy is attained. By contrast, Q-learning provides two key advantages: off-policy learning and diversity preservation at convergence. We further demonstrate that careful reward design is necessary to prevent reward hacking in Q-learning. Finally, applying our framework to the real-world planning benchmark Blocksworld, we confirm that these behaviors manifest in practice. Planning is a fundamental cognitive construct that underpins human intelligence, shaping our ability to organize tasks, coordinate activities, and formulate complex solutions such as mathematical proofs. It enables humans to decompose complex goals into manageable steps, anticipate potential challenges, and maintain coherence during problem solving. Similarly, planning plays a pivotal role in state-of-the-art Large Language Models (LLMs), enhancing their ability to address structured and long-horizon tasks with greater accuracy and reliability. Early generations of LLMs primarily relied on next-token prediction and passive statistical learning, which limited their planning capabilities to short-horizon, reactive responses.
- North America > United States > California (0.14)
- Asia (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
The Decoupled Risk Landscape in Performative Prediction
Sanguino, Javier, Kehrenberg, Thomas, Lozano, Jose A., Quadrianto, Novi
Performative Prediction addresses scenarios where deploying a model induces a distribution shift in the input data, such as individuals modifying their features and reapplying for a bank loan after rejection. Literature has had a theoretical perspective giving mathematical guarantees for convergence (either to the stable or optimal point). We believe that visualization of the loss landscape can complement this theoretical advances with practical insights. Therefore, (1) we introduce a simple decoupled risk visualization method inspired in the two-step process that performative prediction is. Our approach visualizes the risk landscape with respect to two parameter vectors: model parameters and data parameters. We use this method to propose new properties of the interest points, to examine how existing algorithms traverse the risk landscape and perform under more realistic conditions, including strategic classification with non-linear models. (2) Building on this decoupled risk visualization, we introduce a novel setting - extended Performative Prediction - which captures scenarios where the distribution reacts to a model different from the decision-making one, reflecting the reality that agents often lack full access to the deployed model.
- Europe > United Kingdom > England > East Sussex > Brighton (0.04)
- Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
- Asia > Middle East > Israel > Southern District > Eilat (0.04)
- Workflow (0.66)
- Research Report (0.64)
Feedforward Learning of Mixture Models
Matthew Lawlor, Steven W. Zucker
We develop a biologically-plausible learning rule that provably converges to the class means of general mixture models. This rule generalizes the classical BCM neural rule within a tensor framework, substantially increasing the generality of the learning problem it solves. It achieves this by incorporating triplets of samples from the mixtures, which provides a novel information processing interpretation to spike-timing-dependent plasticity. We provide both proofs of convergence, and a close fit to experimental data on STDP.
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- North America > United States > New York (0.04)
Tight Lower Bounds and Improved Convergence in Performative Prediction
Khorsandi, Pedram, Gupta, Rushil, Mofakhami, Mehrnaz, Lacoste-Julien, Simon, Gidel, Gauthier
Performative prediction is a framework accounting for the shift in the data distribution induced by the prediction of a model deployed in the real world. Ensuring rapid convergence to a stable solution where the data distribution remains the same after the model deployment is crucial, especially in evolving environments. This paper extends the Repeated Risk Minimization (RRM) framework by utilizing historical datasets from previous retraining snapshots, yielding a class of algorithms that we call Affine Risk Minimizers and enabling convergence to a performatively stable point for a broader class of problems. We introduce a new upper bound for methods that use only the final iteration of the dataset and prove for the first time the tightness of both this new bound and the previous existing bounds within the same regime. We also prove that utilizing historical datasets can surpass the lower bound for last iterate RRM, and empirically observe faster convergence to the stable point on various perfor-mative prediction benchmarks. We offer at the same time the first lower bound analysis for RRM within the class of Affine Risk Min-imizers, quantifying the potential improvements in convergence speed that could be achieved with other variants in our framework.
- North America > United States > California (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > Canada > Quebec (0.04)
- (2 more...)
On the Conditions for Domain Stability for Machine Learning: a Mathematical Approach
This work proposes a mathematical approach that (re)defines a property of Machine Learning models named stability and determines sufficient conditions to validate it. Machine Learning models are represented as functions, and the characteristics in scope depend upon the domain of the function, what allows us to adopt topological and metric spaces theory as a basis. Finally, this work provides some equivalences useful to prove and test stability in Machine Learning models. The results suggest that whenever stability is aligned with the notion of function smoothness, then the stability of Machine Learning models primarily depends upon certain topological, measurable properties of the classification sets within the ML model domain.
- North America > United States > New York > New York County > New York City (0.05)
- Europe > France (0.04)
Reviews: A Bridging Framework for Model Optimization and Deep Propagation
Paper summary: The paper proposed a learning based hybrid proximal gradient method for composite minimization problems. The iteration is divided into two modules: the learning module does data fidelity minimization with certain network-based priors; consequently the optimization module generates strict convergence propagations by applying proximal gradient feedback on the output of the learning module. The generated iterates were shown to be a Cauchy sequence converging to the critical points of the original objective. The method was applied to image restoration tasks with performance evaluated. Comments: The core idea is to develop a learning based optimization module to incorporate domain knowledge into conventional proximal gradient descent procedure.
A Dynamic Model of Performative Human-ML Collaboration: Theory and Empirical Evidence
Sühr, Tom, Samadi, Samira, Farronato, Chiara
Machine learning (ML) models are increasingly used in various applications, from recommendation systems in e-commerce to diagnosis prediction in healthcare. In this paper, we present a novel dynamic framework for thinking about the deployment of ML models in a performative, human-ML collaborative system. In our framework, the introduction of ML recommendations changes the data generating process of human decisions, which are only a proxy to the ground truth and which are then used to train future versions of the model. We show that this dynamic process in principle can converge to different stable points, i.e. where the ML model and the Human+ML system have the same performance. Some of these stable points are suboptimal with respect to the actual ground truth. We conduct an empirical user study with 1,408 participants to showcase this process. In the study, humans solve instances of the knapsack problem with the help of machine learning predictions. This is an ideal setting because we can see how ML models learn to imitate human decisions and how this learning process converges to a stable point. We find that for many levels of ML performance, humans can improve the ML predictions to dynamically reach an equilibrium performance that is around 92% of the maximum knapsack value. We also find that the equilibrium performance could be even higher if humans rationally followed the ML recommendations. Finally, we test whether monetary incentives can increase the quality of human decisions, but we fail to find any positive effect. Our results have practical implications for the deployment of ML models in contexts where human decisions may deviate from the indisputable ground truth.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.46)
- Education > Educational Setting (0.46)
- Information Technology > Services (0.34)
Feedforward Learning of Mixture Models
We develop a biologically-plausible learning rule that provably converges to the class means of general mixture models. This rule generalizes the classical BCM neural rule within a tensor framework, substantially increasing the generality of the learning problem it solves. It achieves this by incorporating triplets of samples from the mixtures, which provides a novel information processing interpretation to spike-timing-dependent plasticity. We provide both proofs of convergence, and a close fit to experimental data on STDP.
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- North America > United States > New York (0.04)
Differential Privacy of Noisy (S)GD under Heavy-Tailed Perturbations
Şimşekli, Umut, Gürbüzbalaban, Mert, Yıldırım, Sinan, Zhu, Lingjiong
Injecting heavy-tailed noise to the iterates of stochastic gradient descent (SGD) has received increasing attention over the past few years. While various theoretical properties of the resulting algorithm have been analyzed mainly from learning theory and optimization perspectives, their privacy preservation properties have not yet been established. Aiming to bridge this gap, we provide differential privacy (DP) guarantees for noisy SGD, when the injected noise follows an $\alpha$-stable distribution, which includes a spectrum of heavy-tailed distributions (with infinite variance) as well as the Gaussian distribution. Considering the $(\epsilon, \delta)$-DP framework, we show that SGD with heavy-tailed perturbations achieves $(0, \tilde{\mathcal{O}}(1/n))$-DP for a broad class of loss functions which can be non-convex, where $n$ is the number of data points. As a remarkable byproduct, contrary to prior work that necessitates bounded sensitivity for the gradients or clipping the iterates, our theory reveals that under mild assumptions, such a projection step is not actually necessary. We illustrate that the heavy-tailed noising mechanism achieves similar DP guarantees compared to the Gaussian case, which suggests that it can be a viable alternative to its light-tailed counterparts.
- North America > United States > New York (0.04)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- North America > United States > Florida > Leon County > Tallahassee (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)